| ID | BodyMass |
|---|---|
| 1 | 5085.467 |
| 2 | 4983.132 |
| 3 | 4384.706 |
| 4 | 4773.966 |
| 5 | 5224.501 |
| 6 | 5272.518 |
| 7 | 4467.005 |
| 8 | 4892.681 |
Day 1
Meaning? Depends on who you ask to..
Frequentist: Essentially, the (long-run) relative frequency (or proportion) of an event happening
Bayesian: Essentially, the relative plausibility of an event happening given what we already know about what generates events and what we actually observe (i.e., data)
Which one is best?
NONE
Both are useful
“Frequentist” or “bayesian”..probabilities obey to rules:
Union (mutually exclusive): \(Pr(A \cup B) = Pr(A) + Pr(B)\), if \(Pr(A \cap B) = 0\)
Intersection: \(Pr(A \cap B)\)
Union (not mutually exclusive): \(Pr(A \cup B) = Pr(A) + Pr(B) - Pr(A \cap B)\)
Joint probability: \(Pr(A) \cdot Pr(B)\), if A and B are independent
Independence: \(Pr(A|B)=Pr(A)\) and \(Pr(B|A)=Pr(B)\)
Conditional probability: \(Pr(A|B)=\frac{Pr(A \cap B)}{Pr(B)}\)
\(\cup\): read it like “probability of either A OR B or both occurring”
\(\cap\): read it like “probability of A AND B simultaneously occurring”
Note that, under independence between A and B: 1
\(Pr(A|B)=\frac{Pr(A \cap B)}{Pr(B)}\\Pr(A)\cdot Pr(B)=Pr(A \cap B)\)
While, under lack of independence between A and B: 2
\(Pr(A|B)=\frac{Pr(A \cap B)}{Pr(B)}\\Pr(A|B)\cdot Pr(B)=Pr(A \cap B)\)
Discrete measures 👉 probability
Continuous measures 👉 density
The probability for any specific value of a continuous measure is \(0\)
Densities are related to (but not exactly the same as) probabilities
Both still obey to probability rules: pdf(s) integrate to 1; pmf sum up to 1
Cdf(s) map measures to the probability of them assuming a specific (or lower) value
Usually written as: \(F = Pr(X \leq x)\)
Cdf(s) exist for both continuous and discrete measures
Quantiles are values assumed by a measure that split its pdf (or pmf) in two groups of observations
Example: percentiles split a probability distribution in 100 samples of measures of equal size (and probability)
Example: the median is the 2nd quartile
The Fantastic 4
d*, p*, q*, r*
d*: compute density (cont.) or probability (discr.)
p*: returns \(Pr(measure\leq quantile)\) (mind the tail argument)
q*: returns quantile for a given \(Pr(measure\leq quantile)\) (mind the tail argument)
r*: draw random values of measures from a model
Examples:
Gaussian: dnorm, pnorm, qnorm, rnorm
Binomial: dbinom, pbinom, qbinom, rbinom
Data: information we have available
Model: a set of assumptions to describe a simplified version of reality
Parametric model: Model described by parameters (see pdf(s) and pmf(s))
Probability: how measures behave according to our model
We have data and models, what do we do now?
Let’s use data to estimate model parameters!
| ID | BodyMass |
|---|---|
| 1 | 5085.467 |
| 2 | 4983.132 |
| 3 | 4384.706 |
| 4 | 4773.966 |
| 5 | 5224.501 |
| 6 | 5272.518 |
| 7 | 4467.005 |
| 8 | 4892.681 |
Assumption: Body mass of (all existing) Gentoo’s penguins is normally distributed with some mean and variance
Parametric model: \(Gentoo\hspace{1 mm}body\hspace{1 mm}size \sim \mathcal{N}(\mu,\, \sigma^{2})\)
Maximizing the joint probability of the data|parameters allows finding the parameter(s) that maximize(s) the L of observing the data (under the assumed model)!
Likelihood(parameters|data) = Probability(data|parameters)
Link data, model and L
Data: sample of \(n\) penguins on which we measure BM
Model:
\(BM \sim \mathcal{N}(\mu,\,\sigma^{2})\\\)
Probability (density) for \(BM_i\):
\(f(x) = \frac{1}{\sigma \sqrt{2\pi} } e^{-\frac{1}{2}\left(\frac{BM_i-\mu}{\sigma}\right)^2}\)
L (given model):
\(\prod\limits_{i=1}^{n} \frac{1}{\sigma \sqrt{2\pi} } e^{-\frac{1}{2}\left(\frac{BM_i-\mu}{\sigma}\right)^2}\)
“Move” along combinations of \(\mu\) and \(\sigma^2\) and find those that maximize L
We usually maximize the log-Likelihood (LL) for two main reasons:
\(log(\prod\limits_{i=1}^{n}X_i) = \sum\limits_{i=1}^{n}log(X_i)\)
We will:
Estimate the population mean of Gentoo’s body mass using brute force
Estimate regression parameters for the relationship between Gentoo’s body mass and flipper length (without using brute force)
Estimate rate parameter of a Poisson population
NOW GO TO R..
What I think I am doing
Model:
\(\mu_i = \alpha + \beta \cdot flipper\hspace{1mm}length_i\)
What I am actually doing
Model:
\(Gentoo\hspace{1 mm}body\hspace{1 mm}size_i \sim \mathcal{N}(\mu_i,\, \sigma^{2})\)
\(\mu_i = \alpha + \beta \cdot flipper\hspace{1mm}length_i\)
\(Y \sim Pois(\lambda)\\,with\hspace{1mm} Y\hspace{1mm} assuming \hspace{1mm}value\hspace{1mm} \geq 0\)
Pmf: \(Pr(Y) = \frac{\lambda^Y\exp^{-\lambda}}{Y!}\)
Likelihood function \(\neq\) Pdf
We found the MLE(s). Does this mean that we now know the population parameters? NO!
MLE(s) have asymptotic properties (sample size matters)
From the shape of the LL, we can estimate how precisely we estimate population parameters
A re-arrangement of conditional probability:
Conditional probability: \(Pr(A|B)=\frac{Pr(A \cap B)}{Pr(B)}\)
\(Pr(A|B)=\frac{Pr(A \cap B)}{Pr(B)}\)
\(Pr(A|B)Pr(B)=Pr(A \cap B)\)
But \(Pr(A \cap B) = Pr(B \cap A)\)
And \(Pr(B \cap A) = Pr(B|A)Pr(A)\)
So \(Pr(A|B)Pr(B) = Pr(B|A)Pr(A)\)
Dividing both sides of the equation by \(Pr(B)\), we end up with:
Bayes’ rule: \(Pr(A|B) = \frac{Pr(B|A)Pr(A)}{Pr(B)}\)
\(Pr\): we are familiar with it (pdf(s), pmf(s))
\(Pr(B|A)\): what if I tell you that \(B\) is data and \(A\) model parameters?
YES! Pr(B|A) IS THE LIKELIHOOD!
\(Pr(A)\): prior..a model for the parameter(s) 🤯
\(Pr(B)\): marginal probability of the data
We gave a name to all ingredients for \(Pr(A|B)\), but what’s \(Pr(A|B)\)?
Recall our aim is to estimate model parameters
This time we won’t restrict ourselves to a unique idea of the DGP, while we’ll rather consider different plausible DGPs
The plausibility of each of these scenarios results from combining what the data suggests about the model (likelihood params|data) and what we assume (about the model) even before looking at the data (the prior):
\(Pr(B|A)\cdot Pr(A)\)
Normalization constant: makes \(Pr(A|B)\) integrate to 1
\(Pr(B)\): marginal probability of the data
\(Pr(B) = \sum\limits_{i=1}^{n}Pr(B|A_i)Pr(A_i)\) from LTP1
\(\sum\limits_{i=1}^{n}Pr(B|A_i)Pr(A_i) = Pr(B|A_1)Pr(A_1) + ... + Pr(B|A_n)Pr(A_n)\)
For any \(A_i\): \(Pr(A_i|B) = \frac{Pr(B|A_i)Pr(A_i)}{Pr(B|A_1)Pr(A_1) + ... + Pr(B|A_n)Pr(A_n)}\)
Not to limit us to a unique perspective on the DGP
Likelihood provides one and only one winner (the MLE)
Probably better suited for ecology & observational studies (?) - nature is complex
‘Frequentist’ approach for experiments?
Both approaches are useful
Grid approximation
Quadratic approximation
MCMC 😎



